36
Quantization of Neural Networks
TABLE 2.2
Evaluating the components of Q-DETR-R50 on the VOC dataset.
Method
#Bits
AP50 #Bits AP50 #Bits AP50
Real-valued
32-32-32
83.3
-
-
-
-
Baseline
4-4-8
78.0
3-3-8
76.8
2-2-8
69.7
+DA
4-4-8
78.8
3-3-8
78.0
2-2-8
71.6
+FQM
4-4-8
81.5
3-3-8
80.9
2-2-8
74.9
+DA+FQM (Q-DETR)
4-4-8
82.7
3-3-8
82.1
2-2-8
76.4
Note: #Bits (W-A-Attention) denotes the bit-width of weights, activations, and attention
activations. DA denotes the distribution alignment module. FQM denotes foreground-aware
query matching.
by 1.9%, and the FQM achieves a 5.2% performance improvement. While combining the
DA and FQM, the performance improvement achieves 6.7%.
Information analysis. We further show the information plane following [238] in
Fig. 2.12. We adopt the test AP50 to quantify I(yGT ; E, q). We employ a reconstruction
decoder to decode the encoded feature E to reconstruct the input and quantify I(X; E)
using the ℓ1 loss. As shown in Fig. 2.12, the curve of the larger teacher DETR-R101 is
usually on the right of the curve of small student models, which indicates a greater ability
of information representation. Likewise, the purple line (Q-DETR-R50) is usually on the
right of the three left curves, showing the information representation improvements with
the proposed methods.